Sense Cluster Based Categorization and Clustering of Abstracts

نویسندگان

  • Davide Buscaldi
  • Paolo Rosso
  • Mikhail Alexandrov
  • Alfons Ciscar
چکیده

This paper focuses on the use of sense clusters for classification and clustering of very short texts such as conference abstracts. Common keyword-based techniques are effective for very short documents only when the data pertain to different domains. In the case of conference abstracts, all the documents are from a narrow domain (i.e., share a similar terminology), that increases the difficulty of the task. Sense clusters are extracted from abstracts, exploiting the WordNet relationships existing between words in the same text. Experiments were carried out both for the categorization task, using Bernoulli mixtures for binary data, and the clustering task, by means of Stein’s MajorClust method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Ontology-Based Sense Folder Classification of Document Collections with Clustering Methods

In this paper we describe first results of our research on the disambiguation of user queries using ontologies for categorization. We present an approach to cluster search results by using classes or ‘Sense Folders’ (prototype categories) derived from the concepts of an assigned ontology, here MultiWordNet. Using the semantic relations provided from such a resource, we can assign categories to ...

متن کامل

Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm

Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...

متن کامل

Shallow Text Clustering Does Not Mean Weak Topics: How Topic Identification Can Leverage Bigram Features

Text clustering and topic learning are two closely related tasks. In this paper, we show that the topics can be learnt without the absolute need of an exact categorization. In particular, the experiments performed on two real case studies with a vocabulary based on bigram features lead to extracting readable topics that cover most of the documents. Precision at 10 is up to 74% for a dataset of ...

متن کامل

Clustering of Fuzzy Data Sets Based on Particle Swarm Optimization With Fuzzy Cluster Centers

In current study, a particle swarm clustering method is suggested for clustering triangular fuzzy data. This clustering method can find fuzzy cluster centers in the proposed method, where fuzzy cluster centers contain more points from the corresponding cluster, the higher clustering accuracy. Also, triangular fuzzy numbers are utilized to demonstrate uncertain data. To compare triangular fuzzy ...

متن کامل

A Self-enriching Methodology for Clustering Narrow Domain Short Texts

s of Scientific Texts Using the Transition Point Technique. Proc. CICLing Conference—CICLing’06, Mexico city, Mexico, February 19–25, Lecture Notes in Computer Science 3878, pp. 536–546. Springer, Berlin. [24] Alexandrov, M., Gelbukh, A. and Rosso, P. (2005) An Approach to Clustering Abstracts. Proc. 10th Int. Conf.Application of Natural Language to Information Systems— NLDB’05, Alicante, S...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006